This is the entry point for the paper “Measuring the Landscape of Civil War.” In this file, a raw csv file of the events dataset created for the Mau Mau rebellion is loaded and processed.
library(MeasuringLandscapeCivilWar)
Loading required package: data.table
data.table 1.10.4.1
The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way
Documentation: ?data.table, example(data.table) and browseVignettes("data.table")
Release notes, videos and slides: http://r-datatable.com
Loading required package: devtools
Loading required package: textreuse
Loading required package: LSHR
Loading required package: Matrix
Loading required package: rasterVis
Loading required package: raster
Loading required package: sp
Attaching package: ‘raster’
The following object is masked from ‘package:data.table’:
shift
Loading required package: lattice
Loading required package: latticeExtra
Loading required package: RColorBrewer
Loading required package: ggplot2
Attaching package: ‘ggplot2’
The following object is masked from ‘package:latticeExtra’:
layer
Loading required package: rgdal
rgdal: version: 1.2-13, (SVN revision 686)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 2.1.3, released 2017/20/01
Path to GDAL shared files: /usr/share/gdal
Loaded PROJ.4 runtime: Rel. 4.9.3, 15 August 2016, [PJ_VERSION: 493]
Path to PROJ.4 shared files: (autodetected)
Linking to sp version: 1.2-5
Loading required package: maptools
Checking rgeos availability: TRUE
Loading required package: plyr
Loading required package: glue
Loading required package: mosaic
Loading required package: dplyr
Attaching package: ‘dplyr’
The following object is masked from ‘package:glue’:
collapse
The following objects are masked from ‘package:plyr’:
arrange, count, desc, failwith, id, mutate, rename, summarise, summarize
The following objects are masked from ‘package:raster’:
intersect, select, union
The following objects are masked from ‘package:data.table’:
between, first, last
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
Loading required package: ggformula
New to ggformula? Try the tutorials:
learnr::run_tutorial("introduction", package = "ggformula")
learnr::run_tutorial("refining", package = "ggformula")
Loading required package: mosaicData
The 'mosaic' package masks several functions from core packages in order to add
additional features. The original behavior of these functions should not be affected by this.
Note: If you use the Matrix package, be sure to load it BEFORE loading mosaic.
Attaching package: ‘mosaic’
The following objects are masked from ‘package:dplyr’:
count, do, tally
The following object is masked from ‘package:plyr’:
count
The following object is masked from ‘package:rgdal’:
project
The following objects are masked from ‘package:raster’:
mean, quantile, resample
The following object is masked from ‘package:Matrix’:
mean
The following objects are masked from ‘package:stats’:
binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test, quantile, sd, t.test, var
The following objects are masked from ‘package:base’:
max, mean, min, prod, range, sample, sum
Loading required package: stringr
Loading required package: stringi
Loading required package: lubridate
Attaching package: ‘lubridate’
The following object is masked from ‘package:plyr’:
here
The following objects are masked from ‘package:data.table’:
hour, isoweek, mday, minute, month, quarter, second, wday, week, yday, year
The following object is masked from ‘package:base’:
date
Loading required package: janitor
Attaching package: ‘janitor’
The following object is masked from ‘package:raster’:
crosstab
Loading required package: digest
Loading required package: tidyverse
Loading tidyverse: tibble
Loading tidyverse: tidyr
Loading tidyverse: readr
Loading tidyverse: purrr
Conflicts with tidy packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
arrange(): dplyr, plyr
as.difftime(): lubridate, base
between(): dplyr, data.table
collapse(): dplyr, glue
compact(): purrr, plyr
count(): dplyr, mosaic, plyr
cross(): purrr, mosaic
date(): lubridate, base
do(): dplyr, mosaic
expand(): tidyr, Matrix
extract(): tidyr, raster
failwith(): dplyr, plyr
filter(): dplyr, stats
first(): dplyr, data.table
here(): lubridate, plyr
hour(): lubridate, data.table
id(): dplyr, plyr
intersect(): lubridate, raster, base
isoweek(): lubridate, data.table
lag(): dplyr, stats
last(): dplyr, data.table
layer(): ggplot2, latticeExtra
mday(): lubridate, data.table
minute(): lubridate, data.table
month(): lubridate, data.table
mutate(): dplyr, plyr
quarter(): lubridate, data.table
rename(): dplyr, plyr
second(): lubridate, data.table
select(): dplyr, raster
setdiff(): lubridate, base
summarise(): dplyr, plyr
summarize(): dplyr, plyr
tally(): dplyr, mosaic
tokenize(): readr, textreuse
transpose(): purrr, data.table
union(): lubridate, raster, base
wday(): lubridate, data.table
week(): lubridate, data.table
yday(): lubridate, data.table
year(): lubridate, data.table
Loading required package: knitr
Loading required package: DT
Loading required package: magrittr
Attaching package: ‘magrittr’
The following object is masked from ‘package:purrr’:
set_names
The following object is masked from ‘package:tidyr’:
extract
The following object is masked from ‘package:raster’:
extract
Loading required package: rgeos
rgeos version: 0.3-25, (SVN revision 555)
GEOS runtime version: 3.6.1-CAPI-1.10.1 r0
Linking to sp version: 1.2-5
Polygon checking: TRUE
Loading required package: ggmap
Google Maps API Terms of Service: http://developers.google.com/maps/terms.
Please cite ggmap if you use it: see citation('ggmap') for details.
Attaching package: ‘ggmap’
The following object is masked from ‘package:magrittr’:
inset
Loading required package: bookdown
Loading required package: stringdist
Loading required package: sf
Linking to GEOS 3.6.1, GDAL 2.1.3, proj.4 4.9.3, lwgeom 2.3.3 r15473
Loading required package: viridis
Loading required package: viridisLite
Loading required package: rvest
Loading required package: xml2
Attaching package: ‘rvest’
The following object is masked from ‘package:purrr’:
pluck
The following object is masked from ‘package:readr’:
guess_encoding
Loading required package: re2r
devtools::load_all()
Loading MeasuringLandscapeCivilWar
# global_loads()
knitr::opts_knit$set(progress = TRUE, verbose = TRUE)
knitr::opts_chunk$set(fig.width = 12, fig.height = 8, warning = FALSE, message = FALSE, cache = TRUE)
options(width = 160)
events <- prep_events(fromscratch = F)
Basic cleaning. Generaly format is DD.MM.YYYY Sometimes multiple days are included by DD1/DD2/MM/YY. Somtimes year is YY or YYYY. -Your plots seem to suggest that there are a number of typos in the dates. All dates should range between 1951-1961.
#p_load(date)
events$event_date_clean <- events$event_date %>%
str_replace_all("[[:digit:]]+/", "") %>% #strip off extra day at the front 01/02.12.1950
str_replace_all("\\.", "/") %>% #Convert periods to slashes
trimws() %>% #trim whitespace
str_replace_all("/52", "/1952") %>% #convert 2 digit years to 4 digit years
str_replace_all("/53", "/1953") %>% #convert 2 digit years to 4 digit years
str_replace_all("/54", "/1954") %>% #convert 2 digit years to 4 digit years
str_replace_all("/55", "/1955") %>% #convert 2 digit years to 4 digit years
str_replace_all("/56", "/1956") %>% #convert 2 digit years to 4 digit years
str_replace_all("/19524", "/1954") %>% #clean typo
dmy() #Feed to lubridate
67 failed to parse.
events %>% filter(is.na(event_date_clean)) %>% dplyr::select(starts_with("event_date")) %>% distinct() %>% print(n=40) #visualize errors
events$event_date_clean_year <- year(events$event_date_clean)
events$event_date_clean_year %>% tabyl() %>% round(3)
How often are event dates missing?
table(events$event_date=="")
FALSE
7946
The documents also have dates, sometimes spanning a period of time. Can use that to nail down missing dates.
(events$document_date_type <- events$document_date %>%
tolower() %>%
mosaic::derivedFactor(
"unknown" = T,
"missing" = str_detect(.,"obscured|missing|illegible|xx|Document missing"),
"on the" = str_detect(.,"on the"),
"to" = str_detect(.," to"),
"for" = str_detect(.,"For "),
"week" = str_detect(.,"week"),
"week ending" = str_detect(.,"week ending"),
"period" = str_detect(.,"period"),
"fortnight" = str_detect(.,"fortnight"),
"ending" = str_detect(.,"ending"),
.method = "last",
.default = "unknown"
)
) %>% tabyl()
events$document_date_clean <- events$document_date %>% tolower() %>%
str_replace_all("Fortnight Ended |period|week ending|for |the |fortnight |ending |week |From |on ","") %>%
str_replace_all("[Digits]*th|[Digits]*st|[Digits]*rd|[Digits]*nd","")
events <- events %>%
dplyr::select(-one_of("document_date_1","document_date_2")) %>% #separate will continue to add columns every time its run
separate(col=document_date_clean,
into=c("document_date_1","document_date_2"),
sep = " to|to |To | - ", remove=F, extra="drop", fill="right")
Unknown variables: `document_date_1`, `document_date_2`
events$document_date_clean_1 <- events$document_date_1 %>%
str_replace_all("[[:digit:]]+/", "") %>% #strip off extra day at the front 01/02.12.1950
str_replace_all("\\.", "/") %>% #Convert periods to slashes
trimws() %>%
dmy()
2696 failed to parse.
events$document_date_clean_2 <- events$document_date_2 %>%
str_replace_all("[[:digit:]]+/", "") %>% #strip off extra day at the front 01/02.12.1950
str_replace_all("\\.", "/") %>% #Convert periods to slashes
trimws() %>%
dmy()
400 failed to parse.
events %>% filter(is.na(document_date_clean_1)) %>% dplyr::select(starts_with("document_date")) %>% distinct() %>% print(n=40) #visualize errors
parse_date_time(c("2016", "2016-04"), orders = c("Y", "Ym"))
[1] "2016-01-01 UTC" "2016-04-01 UTC"
parse_date_time(c("2016", "jan-55"), orders = c("Y", "Ym","bY"))
1 failed to parse.
[1] "2016-01-01 UTC" NA
parse_date_time("1904-jan", "yb") #ok so there are jan-54 that don't parse, there are 25 march to 11 april 1953, where the second parses but not the first
All formats failed to parse. No formats found.
[1] NA
events$document_date_best_date <- events$document_date_clean_2
condition <- is.na(events$document_date_best_date)
events$document_date_best_date[condition] <- events$document_date_clean_1[condition]
(events$document_date_best_year <- year(events$document_date_best_date)) %>% tabyl() %>% round(3)
Only 666 missing from the document date
Heads up, some of these event types in the codebook don’t exist in the data. If a category has zero results, it’s not a bug, just codebook needs to be updated.
cat("\014")
p_load(car, stringi, stringr, xtable, SnowballC)
events$type_clean <- str_trim(stri_trans_totitle(events$type))
(events$type_clean_agglow <- events$type_clean %>%
str_trim() %>%
tolower() %>%
car::recode("
'desertion'='desertion';
'escape'='escape';
c('abduction','kidnapping','kidnap','kitnap','kindnap')='abduction';
c('assault','attack','assaulted','assaults','assualt','assult')='assault';
c('murder','elimination','kidnap / murder','')='murder';
c('arson','burn')='arson';
c('slashed','stampede')='cattle slashing';
'vandalism'='vandalism';
c('theft','thefts','thet','missing','lost','entry')='theft';
c('confiscate','sentenced')='punishment';
c('capture','captured')='rebel capture';
c('oath','oathing','recruitment','recruited')='oathing';
c('contact','caontact','contacts','drove off','drive off','drove off',
'chased off','broke up oathing','ambush')='contact';
c('patrol','police and kpr patrol','sweep')='patrol';
c('screening','sreening')='screening';
c('type')='unclassified'
")) %>%
tabyl(sort = TRUE) %>%
adorn_crosstab(digits = 1)
NAs introduced by coercion
(events$type_clean_aggmed <- car::recode(events$type_clean_agglow, "
c('abduction','assault','murder')='physical violence';
c('vandalism','arson','cattle slashing')='property destruction';
c('theft')='theft';
c('contact','screening','sreening','patrol','punishment')='security operations';
c('desertion','escape','unclassified')='unclassified';
")) %>%
tabyl(sort = TRUE) %>%
adorn_crosstab(digits = 1)
NAs introduced by coercion
(events$type_clean_agghigh <- recode(events$type_clean_aggmed, "
c('oathing','physical violence','property destruction','theft')='rebel activity';
c('rebel capture','security operations')='government activity';
")) %>%
tabyl(sort = TRUE) %>%
adorn_crosstab(digits = 1)
NAs introduced by coercion
Collapsed Initators to just Rebels, Government, and Civilians
cat("\014")
initiator_target_master_clean <- "
c('ammunition')= 'ammunition' ;
c('explosives', 'gelignite')= 'explosives' ;
c('arms', 'firearm', 'gun', 'pistol', 'rifle',
'ammunition', 'rifile', 'shotgun', 'verey pistol')= 'firearms' ;
c('axe','scabbard','weapons')= 'other weapons' ;
c('councillor', 'district commissioner', 'district officer', 'forest ranger', 'game ranger',
'game warden', 'government',
'government employees', 'port authority', 'public works department', 'screening team' , 'do',
'govrnment', 'wakamba screening team',
'do munuga','african do','dcmeru', 'colonial authorities' ,'govtemployee'
)= 'colonial authorities' ;
c('chief', 'elders', 'headman' , 'chief chostram','chief eliud', 'chief\\'s sentry'
)= 'tribal authorities' ;
c('buildings', 'cattle dip', 'duka', 'farms',
'garage', 'homes','huts', 'hotel', 'land rover', 'lorry', 'market', 'office', 'oxcart', 'property',
'pump house', 'sawmill', 'shops', 'stores',
'tractor', 'vehicle', 'windmill' , 'bullock\\'s farm','cattle boma','coffe trees','coffee trees',
'cuthouse','dairy farm','dip','house','household',
'houses','hut','instrument','labour camp post','labour huts','lorries','lucerne sheds','maize shamba',
'milk factory','pig sty','private property',
'property of civilians','shop','store','thika fishing camp','vehicles')= 'private property';
c('cash', 'funds', 'money' , 'conductor\\'s takings'
)= 'cash';
c('banana', 'barley', 'bran', 'cabbage', 'coffee', 'corn', 'cream', 'crops', 'dairy', 'food',
'fruit', 'grain', 'honey', 'maize',
'meat', 'milk', 'oats', 'posho', 'potatoes', 'sugar', 'vegetable', 'wheat',
'food','food etc','food store','food stores','foodstuffs','fruits','grains',
'grains+cloth +money','green maize cobs','potato','potato store',
'potatos','skimmed milk','sugar cane','sugar maize','vegetables','vegitable garden',
'vegitables','wheat bags','wheat store','wheet','whisky'
)= 'food';
c('beast', 'cattle', 'cow', 'herd', 'livestock', 'pig', 'sheep', 'steer', 'stock',
'animal', 'bulls','calf','calves','chicken','cows','donkey','goat','goats',
'head of cattle','head of cow','head of sheep','heifer','heifers',
'lamb','live stock','livestock','livestocks','masai herd','milk cow','ox','ox cart',
'oxen','ram','red poll cattle','shee','sheep or ox','steers','stocks'
)= 'livestock';
c('medical supplies', 'medicine', 'm&b tablets', 'medicines')= 'medicine';
c('bags', 'bedding', 'blankets', 'books', 'charcoal', 'cloth', 'clothing',
'cooking utensils', 'cutlery', 'equipment', 'farm implements',
'household items','instruments', 'iron', 'pails','petrol', 'provisions',
'oil', 'sacks', 'supplies', 'tarpaulin', 'thatch', 'timber',
'tobacco', 'tools', 'uniforms', 'wire', 'wireless set', 'whiskey',
'articles','bag','battery','bucket','ciga','cigarettes','clothes',
'clothing etc','cloths','dairy item','dairy record book','goods',
'material','oil+tins','provisionv','railway uniforms','supplies',
'tarpaulian','typewriter','v- drive belts', 'gunny bags'
)= 'supplies';
c('church')= 'church';
c('airstrip', 'bridges', 'half built village', 'roads', 'trenches', 'water tank',
'bridge', 'bridge broken', 'bridge damaged', 'infrastructure', 'milt property',
'miltproperty', 'prison camp','stn damaged'
)= 'infrastructure';
c('school', 'school','school building','school house','school property','schools')= 'school';
c('bg','kg','eg', 'guard','embu guard', 'farm guard', 'forest guard', 'home guard',
'ikandine guard', 'kathanjure guard', 'kijabe guard',
'kikuyu guard', 'masai guard', 'meru guard', 'nandi guard', 'nkubu guard',
'stock guard', 'tigoni guard','tp and eg patrol','hg','tp patrol','home guard patrol',
'm', 'm/g','m/g patrol','g',
'kathanjure hg','k g', 'ng',
'eg patrol', 'hg camp','hg leader','hg patrol','hg post','home','home guard','kg post'
)= 'home guard';
c('arab combat' , 'arab combat unit')= 'arab combat units';
c('asian combat', 'asian combat unit', 'asian combat team', 'second asian combat unit' )= 'asian combat units';
c('3 kar', '4 kar', '5 kar', '6 kar', '7 kar', '23 kar', '26 kar','k.a.r','k.p.r','k.a.r.',
'5th k.a.r','5kar','5 k.a.r','4th kar','kar' ) = 'Kings African Rifles';
c('devonshire regiment','devons', 'field intelligence assistant', 'field intelligence officer',
'fio', 'gloucestershire regiment', 'glosters', 'lancashire fusiliers', 'king\\'s shropshire light infantry',
'royal east kent regiment', 'buffs', 'royal fusiliers', 'royal highland regiment','black watch',
'watch', 'royal inniskilling fusiliers', 'royal irish fusiliers', 'royal northumberland fusiliers',
'rnf','police and military', 'army' , 'lancashire fusilliers', 'sp company 1 royal innisks',
'1 rnf', 'rif', 'ksli', 'inniskillings', 'fia','1 glosters', '1 bw', '1 buffs',
'\"a\" company 1 royal innisks',
'\"a\" company', 'royal fusilers', 'of devons','of 1 glosters', 'lanc fus', 'fusiliers',
'fio kruger','fios','a co devon','4 platoon support company',
'\"c\" company1 royal innisks','6 platoonsp company 1 royal innisks','1 lf',
'\"c\" company',
'\"d\" company','\"a\"','\"a\" company bw','buffs ambush','d company','d\\' force','devens',
'c company','\"d\" force',
'army officer',
'british army officer',
'british military',
'buffs patrol',
'european officer',
'european soldiers',
'gloster patrol'
)= 'british military';
c('kenya regiment','captain folliott’s team' , 'kr', 'kenreg', 'kenregg','kenya regiment sergeant',
'kenya regt','keniya regiment','kenya regiment private')= 'kenya regiment';
c('captain', 'company', 'military', 'army', 'military property', 'platoon', 'security forces',
'security force', 'coy', 'striking force' ,'sentry',
'military (generic)', 'non commissioned officers', 'patrol', 'sentrie', 'sgt white'
)= 'military (generic)';
c('pseudo gang', 'pseudo team', 'trojan', 'psuedo gangs', 'trojan team' , 'tracker group',
'pseudo teams')= 'psuedo gangs';
c('raf', 'bombers', 'air strike', 'harvards', 'raf lincolns','flying squard')='royal air force';
c('general service unit', 'gsu' )= 'paramilitary';
c('cid')='cid';
c('kenya police', 'kp' , 'kp constables\\' quarters', 'kpa'
)= 'kenya police';
c('kenya police reserve', 'kpr', 'kpr officers', 'reserve police officer', 'rpo' ,
'rpos', 'police and k.p.r')= 'kenya police reserve';
c('constable', 'police', 'polce','policy party')= 'police (generic)';
c('railway police' )= 'railway police';
c('special branch', 'blue doctor team', 'special branch team', 'sb officers' )= 'special branch';
c('githumu police', 'masai special constable', 'tribal police', 'tp' , 'tpeg',
'african constable', 'african costable', 'african special constable', 'tribal police'
)= 'tribal police';
c('tribal police reserve', 'tpr') = 'tribal police reserve';
c('manyatta', 'fishing camp', 'sublocation', 'village', 'camp' , 'villages')= 'communities';
c('detainees', 'prisoner', 'prisoners'
)= 'detainees';
c('bandits', 'food foragers', 'gangs', 'gang', 'kiama kia muingi' , 'kkm', 'komerera' , 'mau mau', 'oath administrator', 'passive wing',
'rebels', 'suspects', 'terrorists','terrorosts','terrorist', 'gunman', 'terorist', 'gunmen',
'resistance group','resistance groups', 'oath administrater','oath administrators','passive wing members','resistance','suspect',
'suspected insurgents','terroist','terroists','terrost') = 'suspected insurgents';
c('africans', 'children', 'civilian','civilians', 'driver', 'employees', 'evangelist',
'family', 'farm boys', 'girls', 'informer',
'kikuyu', 'laborour', 'loyalist', 'masai', 'men', 'mission staff', 'owner', 'passengers',
'people', 'tugen tribesmen' , 'stranger', 'sikh',
'herd boys', 'isiolo game scouts', 'farm labour', 'farmer', 'european', 'employer',
'employee', 'civilan','shopkeeper' , 'students', 'teachers',
'turkana', 'vigilantes', 'women', 'workers','villagers', 'labour', 'local labour',
'kikuyus', 'embu', 'tiriki houseboy', 'samburu', 'manager', 'woman',
'vetofficer', 'mrhiggins', 'masai party','kuria tribesmen','manager of akira estates',
'kuria tribesmen','chstephen','african',
'catholic misson staff', 'african staff', 'asian women', 'bus conductor', 'child',
'civilian(food carriers)', 'civilian(schoolmaster)', 'civilians',
'civilion', 'committee', 'committee member', 'courier','elder','embu tractor driver',
'employees of club','engine boy','girl','golf club staff','his own hut',
'hotel keeper','houseboy','illegal residents','indian','interpreter','kem','kikiyu',
'kikuyu assessor','kikuyu families','kikuyu houseboy','kikuyu labourer','kikyu',
'kirua village','labour line','labour lines','labourer','labourers',
'laboures','labourline','labours','males','man','maragoli','maragoli labourer',
'masai elders','masai tribesman','members of the thika committee',
'mna section leaders','municipal inspectors','non kikuyu employees','person',
'prostitutes','purke masai','pwd employee','railway employees',
'school master','school teacher','sisters committee','somali','staff','strangers',
'taxi drivers','teacher','treasurers',
'headman\\'s son','norton traill\\'s labour','gordon\\'s labour', 'food carriers'
) = 'civilians';
c('')=NA
"
regex <- "\\.|patrol|[1-9]\\s*rd|[1-9]\\s*th" # with regex start trying to get more of these to automatically map instead of generating lots of hand codings
events$initiator_clean <- events$initiator %>% str_trim() %>% gsub(regex, "", .)
events <- events %>%
dplyr::select(-one_of("initiator_clean_1", "initiator_clean_2", "initiator_clean_3")) %>% # separate will continue to add columns every time its run
separate(
col = initiator_clean,
into = c("initiator_clean_1", "initiator_clean_2", "initiator_clean_3"),
sep = "and|\\\\|/|\\&|,", remove = F, extra = "drop", fill = "right"
)
events <- events %>%
mutate_at(vars(starts_with("initiator_clean_")), funs(gsub(".*police.*", "police", .))) %>%
mutate_at(vars(starts_with("initiator_clean_")), funs(gsub(".*guard.*", "guard", .))) %>%
mutate_at(vars(starts_with("initiator_clean_")), funs(gsub(".*terror.*|.*mau mau.*|.*gang.*", "terrorist", .))) %>%
mutate_at(vars(starts_with("initiator_clean_")), funs(gsub(".*kpr.*|.*k p r.*", "kpr", .))) %>%
mutate_at(vars(starts_with("initiator_clean_")), funs(gsub(".*kar.*|.*k a r.*", "kar", .))) %>%
mutate_at(vars(starts_with("initiator_clean_")), funs(gsub(".*coy.*", "coy", .))) %>%
mutate_at(vars(starts_with("initiator_clean_")), funs(gsub(".*gsu.*", "gsu", .))) %>%
mutate_at(vars(starts_with("initiator_clean_")), funs(gsub(".*watch.*", "watch", .))) %>%
mutate_at(vars(starts_with("initiator_clean_")), funs(trimws(.)))
events <- events %>%
mutate(initiator_clean_1_agglow = recode(initiator_clean_1, initiator_target_master_clean)) %>%
mutate(initiator_clean_2_agglow = recode(initiator_clean_2, initiator_target_master_clean)) %>%
mutate(initiator_clean_3_agglow = recode(initiator_clean_3, initiator_target_master_clean))
NAs introduced by coercionNAs introduced by coercionNAs introduced by coercion
# sort(table(events$initiator_clean_1_agglow))
lowlevelagg <- c(
"arab combat units", "cid", "psuedo gangs", "asian combat units", "special branch",
"tribal authorities", "tribal police reserve", "royal air force",
"paramilitary", "kenya regiment", "tribal police", "kenya police reserve", "kenya police",
"british military", "civilians", "Kings African Rifles", "military (generic)", "police (generic)",
"railway police", "home guard", "colonial authorities", "suspected insurgents"
)
# events <- events %>%
# mutate(initiator_clean_1_agglow=ifelse(initiator_clean_1_agglow %in% lowlevelagg & !is.na(initiator_clean_1_agglow),initiator_clean_1_agglow, "uncategorized")) %>% mutate(initiator_clean_2_agglow=ifelse(initiator_clean_2_agglow %in% lowlevelagg & !is.na(initiator_clean_2_agglow),initiator_clean_2_agglow, "uncategorized")) %>% mutate(initiator_clean_3_agglow=ifelse(initiator_clean_3_agglow %in% lowlevelagg & !is.na(initiator_clean_3_agglow),initiator_clean_3_agglow, "uncategorized"))
# table(events$initiator_clean_1_agglow, useNA="always")
events[, c("initiator_clean_1_aggmed", "initiator_clean_2_aggmed", "initiator_clean_3_aggmed")] <-
events[, c("initiator_clean_1_agglow", "initiator_clean_2_agglow", "initiator_clean_3_agglow")]
events <- events %>%
mutate_at(
vars(starts_with("initiator_clean_1_aggmed|initiator_clean_2_aggmed|initiator_clean_3_aggmed")),
.funs = funs(car::recode("
c('cid','kenya police reserve','kenya police','police (generic)','railway police','special branch',
'tribal police','tribal police reserve') = 'police';
c('arab combat units','asian combat units','british military','Kings African Rifles',
'kenya regiment','military (generic)','psuedo gangs','royal air force') = 'military';
c('colonial authorities', 'tribal authorities')='civil authorities'
"))
)
events$initiator_clean_2_aggmed %>%
tabyl(sort = TRUE) %>%
adorn_crosstab(digits = 1)
events[, c("initiator_clean_1_agghigh", "initiator_clean_2_agghigh", "initiator_clean_3_agghigh")] <-
events[, c("initiator_clean_1_aggmed", "initiator_clean_2_aggmed", "initiator_clean_3_aggmed")]
events <- events %>%
mutate_at(
vars(starts_with("initiator_clean_1_agghigh|initiator_clean_2_agghigh|initiator_clean_3_agghigh")),
.funs = funs(car::recode("
c('civil authorities', 'home guard', 'military', 'police', 'paramilitary') ='government';
c('suspected insurgents') ='rebels';
"))
)
events$initiator_clean_3_agghigh %>%
tabyl(sort = TRUE) %>%
adorn_crosstab(digits = 1)
regex <- "\\.|patrol|[1-9]\\s*rd|[1-9]\\s*th" # with regex start trying to get more of these to automatically map instead of generating lots of hand codings
events$target_clean <- events$initiator %>% str_trim() %>% tolower() %>% gsub(regex, "", .)
events <- events %>%
dplyr::select(-one_of("target_clean_1", "target_clean_2", "target_clean_3")) %>% # separate will continue to add columns every time its run
separate(
col = initiator_clean,
into = c("target_clean_1", "target_clean_2", "target_clean_3"),
sep = "and|\\\\|/|\\&|,", remove = F, extra = "drop", fill = "right"
)
events <- events %>%
mutate_at(vars(starts_with("target_clean_")), funs(gsub(".*police.*", "police", .))) %>%
mutate_at(vars(starts_with("target_clean_")), funs(gsub(".*guard.*", "guard", .))) %>%
mutate_at(vars(starts_with("target_clean_")), funs(gsub(".*terror.*|.*mau mau.*|.*gang.*", "terrorist", .))) %>%
mutate_at(vars(starts_with("target_clean_")), funs(gsub(".*kpr.*|.*k p r.*", "kpr", .))) %>%
mutate_at(vars(starts_with("target_clean_")), funs(gsub(".*kar.*|.*k a r.*", "kar", .))) %>%
mutate_at(vars(starts_with("target_clean_")), funs(gsub(".*coy.*", "coy", .))) %>%
mutate_at(vars(starts_with("target_clean_")), funs(gsub(".*gsu.*", "gsu", .))) %>%
mutate_at(vars(starts_with("target_clean_")), funs(gsub(".*watch.*", "watch", .))) %>%
mutate_at(vars(starts_with("target_clean_")), funs(trimws(.)))
events <- events %>%
mutate(target_clean_1_agglow = recode(target_clean_1, initiator_target_master_clean)) %>%
mutate(target_clean_2_agglow = recode(target_clean_2, initiator_target_master_clean)) %>%
mutate(target_clean_3_agglow = recode(target_clean_3, initiator_target_master_clean))
NAs introduced by coercionNAs introduced by coercionNAs introduced by coercion
lowlevelagg <- c(
"church", "kenya police", "medicine", "tribal police reserve", "detainees", "kenya regiment", "other weapons",
"paramilitary", "ammunition", "communities", "british military", "military (generic)", "tribal authorities", "kenya police reserve", "tribal police",
"Kings African Rifles", "infrastructure", "school", "cash", "colonial authorities", "police (generic)", "supplies", "firearms", "food", "private property",
"home guard", "civilians", "livestock", "suspected insurgents"
)
# events <- events %>%
# mutate(target_clean_1_agglow=ifelse(target_clean_1_agglow %in% lowlevelagg & !is.na(target_clean_1_agglow),target_clean_1_agglow, "uncategorized")) %>% mutate(target_clean_2_agglow=ifelse(target_clean_2_agglow %in% lowlevelagg & !is.na(target_clean_2_agglow),target_clean_2_agglow, "uncategorized")) %>% mutate(target_clean_3_agglow=ifelse(target_clean_3_agglow %in% lowlevelagg & !is.na(target_clean_3_agglow),target_clean_3_agglow, "uncategorized"))
events$target_clean_1_agglow %>%
tabyl(sort = TRUE) %>%
adorn_crosstab(digits = 1)
events[, c("target_clean_1_aggmed", "target_clean_2_aggmed", "target_clean_3_aggmed")] <-
events[, c("target_clean_1_agglow", "target_clean_2_agglow", "target_clean_3_agglow")]
events <- events %>%
mutate_at(
vars(starts_with("initiator_clean_1_aggmed|initiator_clean_2_aggmed|initiator_clean_3_aggmed")),
.funs = funs(car::recode(temp, "
c('cid','kenya police reserve','kenya police','police (generic)','railway police',
'special branch','tribal police','tribal police reserve') = 'police';
c('arab combat units','asian combat units','british military','Kings African Rifles',
'kenya regiment','military (generic)','psuedo gangs','royal air force') = 'military';
c('colonial authorities', 'tribal authorities')='civil authorities';
c('ammunition','firearms','other weapons')='armaments';
c('cash','food','livestock','medicine','supplies')='provisions';
c('church','school','infrastructure')='public buildings';
"))
)
events$initiator_clean_1_aggmed %>%
tabyl(sort = TRUE) %>%
adorn_crosstab(digits = 1)
events[, c("target_clean_1_agghigh", "target_clean_2_agghigh", "target_clean_3_agghigh")] <-
events[, c("target_clean_1_aggmed", "target_clean_2_aggmed", "target_clean_3_aggmed")]
events <- events %>%
mutate_at(
vars(starts_with("target_clean_1_agghigh|target_clean_2_agghigh|target_clean_3_agghigh")),
.funs = funs(car::recode("
c('civil authorities', 'home guard', 'military', 'police', 'paramilitary') ='government';
c('suspected insurgents','detainees') ='rebels';
c('armaments','private property','provisions','public buildings') ='property';
c('communities','communities')='civilians';
"))
)
events$target_clean_1_agghigh %>%
tabyl(sort = TRUE) %>%
adorn_crosstab(digits = 1)
Helper function for recoding
recoderFunc <- function(data, oldvalue, newvalue) {
# convert any factors to characters
if (is.factor(data)) data <- as.character(data)
if (is.factor(oldvalue)) oldvalue <- as.character(oldvalue)
if (is.factor(newvalue)) newvalue <- as.character(newvalue)
# create the return vector
newvec <- data
# put recoded values into the correct position in the return vector
for (i in unique(oldvalue)) newvec[data %in% i] <- newvalue[oldvalue %in% i]
newvec
}
# These numbers are improvised and can be changed
acouple <- 2
afew <- 3
agang <- 6
agang_large <- 12
recodings <- c(
"100+", "100",
"??", "",
"1 bag", "1",
"1 blanket", "1",
"1 burnt down", "1",
"1 civilian", "1",
"1 cow, 6 sheep", "7",
"1 cow", "1",
"1 goat, clothing", "1",
"1 goat", "1",
"1 looted", "1",
"1 looted", "1",
"1 ox", "1",
"1 sheep and chickens", "1",
"1 sheep, some chickens", "1",
"1 sheep", "1",
"1 shotgun ,30 rounds", "31",
"1 shotgun + 10rds", "11",
"1 steer", "1",
"1 village, 1 market", "1",
"1 wounded", "1",
"1 wrecked", "1",
"1+", "1",
"1+3", "4",
"1+some", "1",
"10 acres", "10",
"10 bags", "10",
"10 cattle", "10",
"10 sacks", "10",
"10 to 12", "11",
"10 to 15", "13",
"10/14/2013", "",
"10/15/2013", "",
"10/20/2013", "",
"100 lb", "100",
"100-130", "115",
"100-150", "125",
"100+", 100,
"10000", "",
"109 cattle", "109",
"10bags potatoes", "10",
"11 cattle", "11",
"11 sheep", "11",
"112 bore & 20.1.45 &7 rds", "112",
"12 bags", "12",
"12 cattle", "12",
"12 goats", "12",
"12 to 15", "13",
"12 to 20", "17",
"12/14/2013", "",
"120 cattle", "120",
"120+1", "121",
"13 sheep", "13",
"13-15", "14",
"1300 worth", "1300",
"14 cattle", "14",
"14 goats", "14",
"14 head", "14",
"14+", "14",
"15 - 20", "18",
"15 cattle", "15",
"15 to 20", "17",
"15 to 20", "17",
"15 to 25", "20",
"15-20", "17",
"15+", "15",
"150-200", "175",
"150+", "150",
"151 cattle", "151",
"17 cattle", "17",
"172 bags burnt", "172",
"18 cattle", "18",
"19 bags", "19",
"196 rounds", "196",
"2 bags maize", "2",
"2 bags", "2",
"2 bags", "2",
"2 buckets", "2",
"2 cattle hamstrung", "2",
"2 cattle, corn", "3",
"2 cattle", "2",
"2 cows", "2",
"2 debbies", "2",
"2 goats", "2",
"2 groups", "2",
"2 huts burnt", "2",
"2 sheep", "2",
"2 watches, cash", "2",
"2/3/2013", "",
"2+", "2",
"20 bags maize, 9 goats, 32 chickens and ducks, cash", "60",
"20 bags", "20",
"20 cattle", "20",
"20 goats", "20",
"20 sheep", "20",
"20 to 25", "23",
"20 to 30", "25",
"20 to 40", "30",
"20-25", "23",
"20-30", "25",
"20-35", "30",
"20-50", "35",
"20/30", "25",
"20/30", "25",
"20+", "20",
"200 yds", "200",
"200-300", "250",
"200+", "200",
"2000 acres", "2000",
"21 goats", "21",
"21 head", "21",
"22 cattle", "22",
"25 to 30", "28",
"25-30", "27",
"25-30", "27",
"28 killed", "28",
"28 sheep", "28",
"3 bags", "3",
"3 bags", "3",
"3 bikes", "3",
"3 cattle", "3",
"3 cattle", "3",
"3 goats", "3",
"3 or 4", "3",
"3 or 4", "3",
"3 pangas", "3",
"3 sheep, 2 calves", "5",
"3 sheep", "3",
"3 to 4", "3",
"3 to 4", "3",
"3/10/2013", "",
"3/4/2013", "",
"3/5/2013", "",
"3/6/2013", "",
"3+", "3",
"3+3+1+2", "9",
"3+some", "3",
"30 acres", "30",
"30 cattle", "30",
"30 to 40", "35",
"30-35", "33",
"30-40", "35",
"30-50", "40",
"30+", "30",
"300-400", "350",
"300+", "300",
"35 bags", "35",
"35 to 40", "37",
"38 cattle", "38",
"3or 4", "3",
"4 bags potatoes", "4",
"4 bags", "4",
"4 goats", "4",
"4 groups", "",
"4 or 5", "4",
"4 oxen", "4",
"4 sheep", "4",
"4 to 8", "6",
"4/6/2013", "",
"40 bag", "40",
"40 cattle", "40",
"40 sacks", "40",
"40 sheep", "40",
"40 to 50", "45",
"40/50", "45",
"400 cattle", "400",
"4000", "",
"44 cattle", "44",
"5 bags", "5",
"5 calves", "5",
"5 cattle", "5",
"5 destroyed", "5",
"5 goats", "5",
"5 killed", "5",
"5 or 6", "5",
"5 sheep, 1 ox", "6",
"5 sheep", "5",
"5 to 6", "5",
"5/10/2013", "",
"5/6/2013", "",
"50 cattle", "50",
"50 to 60", "55",
"50-100", "75",
"50-60", "55",
"50-75", "62",
"50+", "50",
"50+", "50",
"5000 acres", "5000",
"519 +", "519",
"53 detained", "53",
"54 sheep and goats", "54",
"56 committee members", "56",
"6 bag", "6",
"6 bags", "6",
"6 cattle", "6",
"6 cattle", "6",
"6 goats", "6",
"6 or 7", "6",
"6 sheep and goats", "6",
"6 sheep", "6",
"6 to 7", "6",
"6 to 8", "7",
"6 to 9", "8",
"6-8 man", "7",
"6/10/2013", "",
"6/8/2013", "",
"60-100", "80",
"60-70", "65",
"64 cattle", "64",
"7 bags", "7",
"7 cattle", "7",
"7 sheep", "7",
"7/10/2013", "",
"70 bags", "70",
"70 cattle, sheep", "70",
"70-100", "85",
"70000", "",
"75 rounds", "75",
"8 bags potatoes", "8",
"8 cattle", "8",
"8 cows slashed", "8",
"8 cows", "8",
"8 sheep", "8",
"8 to 10", "9",
"8/10/2013", "",
"80 cattle", "80",
"80-100", "90",
"84 sheep, 1 cow, 5 chickens", "90",
"9 cattle", "9",
"9 sheep", "9",
"9 to 10", "9",
"9+9", "18",
"900(not clear)", "900",
"all locals", "",
"all", "",
"app 5", "5",
"app. 100", "100",
"app. 120", "120",
"armed gang", agang,
"band", agang,
"bands", "",
"cattle slashing", "",
"clothing", "",
"considerable quantity", "",
"fairly large gang", agang_large,
"few bags", "",
"few", "",
"food", "",
"gang", agang,
"gangs", agang_large,
"guards", afew,
"half village", "",
"labour", "",
"large crowd", "",
"large force", agang_large,
"large gang", agang_large,
"large meeting", "",
"large number", "",
"large numbers", "",
"large quantities", "",
"large quantity", "",
"large re-oathing ceremony", "",
"large scale", "",
"large", agang_large,
"largish gang", agang_large,
"local populace", "",
"many thousand", "2000",
"mob", "",
"not given", "",
"number", "",
"occupants", "",
"over 200", "200",
"party", "",
"party", agang,
"patrol", agang,
"posho", "",
"potatoes", "",
"quantity of clothing", "",
"section", "",
"several gangs", "agang_large",
"several", "3",
"sheep and goats", "",
"shs 2,300/-", "2300",
"shs 60/-", "60",
"shs. 1,000", "1000",
"shs. 18", "18",
"shs. 30", "30",
"small gang", agang,
"small gangs", "agang",
"small group", agang,
"small party", afew,
"small", agang,
"some", afew,
"sufficient food", "",
"unknown", "",
"very large gang", "agang_large",
"villages in ndia, gichugu, embu divisions", "",
"wives", ""
)
recodings <- matrix(recodings, ncol = 2, byrow = T)
events$initiator_numbers_numeric <- events$initiator_numbers %>% recoderFunc(., recodings[, 1], recodings[, 2]) %>% as.numeric()
number of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthNAs introduced by coercion
events$target_numbers_numeric <- events$target_numbers %>% recoderFunc(., recodings[, 1], recodings[, 2]) %>% as.numeric()
number of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthNAs introduced by coercion
events$affected_count_numeric <- events$affected_count %>% recoderFunc(., recodings[, 1], recodings[, 2]) %>% as.numeric()
number of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthnumber of items to replace is not a multiple of replacement lengthNAs introduced by coercion
events[, c(
"government_killed_clean", "government_wounded_clean", "government_captured_clean",
"rebels_killed_clean", "rebels_wounded_clean", "rebels_captured_clean",
"civilians_killed_clean", "civilians_wounded_clean", "civilians_captured_clean"
)] <-
events[, c(
"government_killed", "government_wounded", "government_captured",
"rebels_killed", "rebels_wounded", "rebels_captured",
"civilians_killed", "civilians_wounded", "civilians_captured"
)]
events <- events %>% mutate_at(
.vars = c(
"government_killed_clean", "government_wounded_clean", "government_captured_clean",
"rebels_killed_clean", "rebels_wounded_clean", "rebels_captured_clean",
"civilians_killed_clean", "civilians_wounded_clean", "civilians_captured_clean"
),
funs(as.numeric(car::recode(., " 'Few'='2';'Many'='3';'others'='2';'Sevaral'='3';
'several'='3'; 'Several More'='3'; 'Several others'='3';
'Some'='3';
'100+'='100'; '23 Families'='23'; '28 families'='28'; '30-40'='35';
'50+'='50'; 'Council of elders'='3';
'Council of war'='3'; 'Few'='2'; 'some'='2';
'Several'='3'; '4500'='45'; '800'='80'; 'Gang'='3'; 'Majority'='3';
; 'many'='3' ; 'Several'='3' ; 'Small gang'='3' ;
'6+'='6' ; '10+'='10' ; '3+'='3';
'unKnown'='1'; 'unknown'='1'; 'UnKnown'='1'; 'UNKNOWN'='1'; 'Unkown'='1';
'Unknown'='1' ; 'Number'='1';'More'='1'; '10197'='' ; '101'='1' ;
'48'='7' ; '146'='1' ; '122'='1'; '208'='1'; '94'='1' ;
NA=0")))
)
NAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercionNAs introduced by coercion
events <- events %>% mutate_at(.vars = c(
"government_killed_clean", "government_wounded_clean", "government_captured_clean",
"rebels_killed_clean", "rebels_wounded_clean", "rebels_captured_clean",
"civilians_killed_clean", "civilians_wounded_clean", "civilians_captured_clean"
), funs(as.numeric))
events <- events %>%
mutate(rebels_killedwounded_clean = rebels_killed_clean + rebels_wounded_clean) %>%
mutate(government_killed_wounded_clean = government_killed_clean + government_wounded_clean) %>%
mutate(rebels_government_killedwounded_clean = rebels_killed_clean + rebels_wounded_clean) %>%
mutate(rebels_government_killed_clean = rebels_killed_clean + government_killed_clean) %>%
mutate(rebels_government_civilians_killed_clean = rebels_killed_clean + government_killed_clean + civilians_killed_clean)
events %>% crosstab(initiator_clean_1_agghigh, type_clean_agghigh) %>% adorn_crosstab(digits = 1)
events %>% crosstab(target_clean_1_agghigh, type_clean_agghigh) %>% adorn_crosstab(digits = 1)
events %>% crosstab(target_clean_1_agghigh, initiator_clean_1_agghigh) %>% adorn_crosstab(digits = 1)
saveRDS(events, "/home/rexdouglass/Dropbox (rex)/Kenya Article Drafts/MeasuringLandscapeCivilWar/inst/extdata/MeasuringLandscapeCivilWar_events_cleaned.Rdata")